A novel bioinformatics tool for phylogenetic classification of genomic sequence fragments derived from mixed genomes of uncultured environmental microbes

نویسندگان

  • Takashi Abe
  • Hideaki Sugawara
  • Shigehiko Kanaya
  • Toshimichi Ikemura
چکیده

A Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional complex data on a two-dimensional map. We modified the conventional SOM to genome informatics, making the learning process and resulting map independent of the order of data input, and developed a novel bioinformatics tool for phylogenetic classification of sequence fragments obtained from pooled genome samples of microorganisms in environmental samples allowing visualization of microbial diversity and the relative abundance of microorganisms on a map. First we constructed SOMs of triand tetranucleotide frequencies from a total of 3.3-Gb of sequences derived using 113 prokaryotic and 13 eukaryotic genomes, for which complete genome sequences are available. SOMs classified the 330000 10-kb sequences from these genomes mainly according to species without information on the species. Importantly, classification was possible without orthologous sequence sets and thus was useful for studies of novel sequences from poorly characterized species such as those living only under extreme conditions and which have attracted wide scientific and industrial attention. Using the SOM method, sequences that were derived from a single genome but cloned independently in a metagenome library could be reassociated in silico. The usefulness of SOMs in metagenome studies was also discussed. key words: Self-Organizing Map, uncultured microorganism, phylogenetic classification, environmental microorganism, metagenome 103 National Institute of Polar Research NII-Electronic Library Service (Amann et al., 1995; Hugenholtz and Pace, 1996; DeLong, 2002). In the metagenomic approach, genomic DNAs are extracted directly from an environmental sample that contains multiple organisms, and the DNA fragments are cloned and sequenced. This is a powerful strategy for comprehensive analysis of biodiversity in an ecosystem including polar regions. However, with a simple collection of many sequence fragments, it is difficult to predict from what phylotypes individual sequences are derived. This is because the conventional phylogenetic classification of genomic sequences is based on sequence homology searches, which require orthologous sequence sets; and therefore, this strategy cannot be applied to poorly characterized or novel gene sequences. We developed a new phylogenetic classification method on the basis of Kohonen’s Self-Organizing Map (SOM) (Kohonen, 1982, 1990; Kohonen et al., 1996), by modifying the SOM for genome informatics (Kanaya et al., 1998, 2001; Abe et al., 2002, 2003). In the present study, the SOM method was optimized for phylogenetic classification of genomic sequences from environmental samples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Comparative bioinformatics analysis of a wild diploid Gossypium with two cultivated allotetraploid species

Background: Gossypium thurberi is a wild diploid species that has been used to improve cultivated allotetraploid cotton. G. thurberi belongs to D genome, which is an important wild bio-source for the cotton breeding and genetic research. To a certain degree, chloroplast DNA sequence information are a versatile tool for species identification and phylogenetic implications in plants. Different ch...

متن کامل

A Novel Bioinformatics Strategy to Analyze Microbial Big Sequence Data for Efficient Knowledge Discovery: Batch-Learning Self-Organizing Map (BLSOM)

With the remarkable increase of genomic sequence data of microorganisms, novel tools are needed for comprehensive analyses of the big sequence data available. The self-organizing map (SOM) is an effective tool for clustering and visualizing high-dimensional data, such as oligonucleotide composition on one map. By modifying the conventional SOM, we developed batch-learning SOM (BLSOM), which all...

متن کامل

A Novel Bioinformatics Method for Efficient Knowledge Discovery by BLSOM from Big Genomic Sequence Data

With remarkable increase of genomic sequence data of a wide range of species, novel tools are needed for comprehensive analyses of the big sequence data. Self-Organizing Map (SOM) is an effective tool for clustering and visualizing high-dimensional data such as oligonucleotide composition on one map. By modifying the conventional SOM, we have previously developed Batch-Learning SOM (BLSOM), whi...

متن کامل

A Novel Bioinformatics Strategy for Function Prediction of Poorly-Characterized Protein Genes Obtained from Metagenome Analyses

As a result of remarkable progresses of DNA sequencing technology, vast quantities of genomic sequences have been decoded. Homology search for amino acid sequences, such as BLAST, has become a basic tool for assigning functions of genes/proteins when genomic sequences are decoded. Although the homology search has clearly been a powerful and irreplaceable method, the functions of only 50% or few...

متن کامل

Environmental Genomics: Exploring Ecological Sequence Space

The ability to extract and characterize genomic DNA fragments from mixed microbial assemblages is providing novel insights into the ecology, evolution, and metabolism of uncultured microorganisms in nature.

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017